Session A1

Learning Algorithm

Conference
10:50 AM — 11:50 AM HKT
Local
Dec 1 Tue, 6:50 PM — 7:50 PM PST

LTG-LSM: The Optimal Structure in LSM-tree Combined with Reading Hotness

Jiaping Yu, Huahui Chen, Jiangbo Qian and Yihong Dong

1
A growing number of KV storage systems have adopted the Log-Structured-Merge-tree (LSM-tree) due to its excellent write performance. However, the high write amplification in the LSM-tree has always been a difficult problem to solve. The reason is that the design of traditional LSM-tree under-utilizes the data distribution of query, and the design space does not take into account the read and write performance concurrently. As a result, we may sacrifice one to improve another performance. When advancing the writing performance of the LSM-tree, we can only conservatively select the design pattern in the design space to reduce the impact on reading throughputs, resulting in limited improvement. Aiming at the shortcomings of existing methods, a new LSM-tree structure (Leveling-Tiering-Grouped-LSM-tree, LTG-LSM) is proposed by us that combined with reading hotness. The LTGLSM structure maintains hotness prediction models at each level of the LSM-tree. The structure of the newly generated disk components is determined by the predicted hotness. Finally, a specific compaction algorithm is carried out to handle the compaction between the different structural components and processing workflow hotness changes. Experiments show that the scheme proposed by this paper significantly reduces the write amplification (up to about 71%) of the original LSM-tree with almost no sacrificing reading performance and improves the write throughputs (up to about 24%) in workflows with different configurations.

Improved MapReduce Load Balancing through Distribution-Dependent Hash Function Optimization

Zafar Ahmad, Sharmila Duppala, Rezaul Chowdhury and Steven Skiena

0
Load balancing of skewed data in MapReduce systems like Hadoop is a well-studied problem. Many heuristics already exist to improve the load balance of the reducers thereby reducing the overall execution time. In this paper, we propose a lightweight optimization approach for MapReduce systems to minimize the makespan for repetitive tasks involving a typical frequency distribution.
Our idea is to analyze the observed frequency distribution for the given task so as to identify an optimal offset parameter c to add in the hash function to minimize makespan. For two different bucketing methods �C modulo labeling and consecutive binning �C we present efficient algorithms for finding the optimal value of c. Finally, we present simulation results for both bucketing methods. The results vary with the data distribution and the number of reducers, but generally reduce makespan by 20% on average for power-law distributions, Results are confirmed with experiments on well-known real-world data sets.

Efficient Sparse-Dense Matrix-Matrix Multiplication on GPUs Using the Customized Sparse Storage Format

Shaohuai Shi, Qiang Wang and Xiaowen Chu

1
Multiplication of a sparse matrix to a dense matrix (SpDM) is widely used in many areas like scientific computing and machine learning. However, existing work under-looks the performance optimization of SpDM on modern manycore architectures like GPUs. The storage data structures help sparse matrices store in a memory-saving format, but they bring difficulties in optimizing the performance of SpDM on modern GPUs due to irregular data access of the sparse structure, which results in lower resource utilization and poorer performance. In this paper, we refer to the roofline performance model of GPUs to design an efficient SpDM algorithm called GCOOSpDM, in which we exploit coalescent global memory access, fast shared memory reuse, and more operations per byte of global memory traffic. Experiments are evaluated on three Nvidia GPUs (i.e., GTX 980, GTX Titan X Pascal, and Tesla P100) using a large number of matrices including a public dataset and randomly generated matrices. Experimental results show that GCOOSpDM achieves 1.5-8�� speedup over Nvidia��s library cuSPARSE in many matrices.

Session Chair

Gongpu Chen (Chinese University of Hong Kong)

Session A2

Algorithms for Cloud Systems

Conference
10:50 AM — 12:10 PM HKT
Local
Dec 1 Tue, 6:50 PM — 8:10 PM PST

Joint Service Placement and Request Scheduling for Multi-SP Mobile Edge Computing Network

Zhengwei Lei, Hongli Xu, Liusheng Huang and Zeyu Meng

1
Mobile edge computing(MEC), as an emerging computing paradigm, pushes services away from centralized remote cloud to distributed edge servers deployed by multiple service providers(SPs), improving user experience and reducing the communication burden on core network. However, this distributed computing architecture also brings some new challenges to the network. In multi-SP MEC system, a SP prefers to use edge servers deployed by itself instead of others, which not only improves service quality but also reduces processing cost. The service placement and request scheduling strategies directly affect the revenue of SPs. Since the service popularity changes over time and the resources of edge servers are limited, the network system needs to make decisions about service placement and request scheduling dynamically to provide better service for users. Owing to the lack of long-term prior knowledge and involving binary decision variables, how to place services and schedule requests to boost the profit of SPs is a challenging problem. We formally formalize this joint optimization problem and propose an efficient online algorithm. First, we invoke Lyapunov optimization technology to convert the long-term optimization problem into a series of subproblems, then a dual-decomposition algorithm is utilized to solve the subproblem. Experimental results show that the algorithm proposed in this paper achieves nearly optimal performance, and it raises 25% and 70% profit compared to greedy and Top-K algorithms, respectively.

A similarity clustering-based deduplication strategy in cloud storage systems

Saiqin Long, Zhetao Li, Zihao Liu, Qingyong Deng, Sangyoon Oh and Nobuyoshi Komuro

0
Deduplication is a data redundancy elimination technique, designed to save system storage resources by reducing redundant data in cloud storage systems. With the development of cloud computing technology, deduplication has been increasingly applied to cloud data centers. However, traditional technologies face great challenges in big data deduplication to properly weigh the two conflicting goals of deduplication throughput and high duplicate elimination ratio. This paper proposes a similarity clustering-based deduplication strategy (named SCDS), which aims to delete more duplicate data without significantly increasing system overhead. The main idea of SCDS is to narrow the query range of fingerprint index by data partitioning and similarity clustering algorithms. In the data preprocessing stage, SCDS uses data partitioning algorithm to classify similar data together. In the data deletion stage, the similarity clustering algorithm is used to divide the similar data fingerprint superblock into the same cluster. Repetitive fingerprints are detected in the same cluster to speed up the retrieval of duplicate fingerprints. Experiments show that the deduplication ratio of SCDS is better than some existing similarity deduplication algorithms, but the overhead is only slightly higher than some high throughput but low deduplication ratio methods.

A Fair Task Assignment Strategy for Minimizing Cost in Mobile Crowdsensing

Yujun Liu, Yongjian Yang, En Wang, Wenbin Liu, Dongming Luan, Xiaoying Sun and Jie Wu

3
Mobile CrowdSensing (MCS) is a promising paradigm that recruits mobile users to cooperatively perform various sensing tasks. When assigning tasks to users, most existing works only consider the fairness of users, i.e., the user��s processing ability, with the goal of minimizing the assignment cost. However, in this paper, we argue that it is necessary to not only give full use of all the users�� ability to process the tasks (e.g., not exceeding the maximum capacity of each user while also not letting any user idle too long), but also satisfy the assignment frequency of all corresponding tasks (e.g., how many times each task should be assigned within the whole system time) to ensure a long-term, double-fair and stable participatory sensing system. Hence, to solve the task assignment problem which aims to reasonably assign tasks to users with limited task processing ability while ensuring the assignment frequency, we first model the two fairness constraints simultaneously by converting them to user processing queues and task virtual queues, respectively. Then, we propose a Fair Task Assignment Strategy (FTAS) utilizing Lyapunov optimization and we provide the proof of the optimality for the proposed assignment strategy to ensure that there is an upper bound to the total assignment cost and queue backlog. Finally, extensive simulations have been conducted over three real-life mobility traces: Changchun/taxi, Epfl/mobility, and Feeder. The simulation results prove that the proposed strategy can achieve a trade-off between the objective of minimizing the cost and the fairness of tasks and users compared with other baseline approaches.

Communication-Aware Load Balancing of the LU Factorization over Heterogeneous Clusters

Lucas Leandro Nesi, Lucas Mello Schnorr and Arnaud Legrand

2
Supercomputers are designed to be as homogeneous as possible but it is common that a few nodes exhibit variable performance capabilities due to processor manufacturing. It is also common to find partitions equipped with different types of accelerators. Data distribution over heterogeneous nodes is very challenging but essential to exploit all resources efficiently. In this article, we build upon task-based runtimes�� flexibility of managing data to study the interplay between static communicationaware data distribution strategies and dynamic scheduling of the linear algebra LU factorization over heterogeneous sets of hybrid nodes. We propose two techniques derived from the state-of-the-art 1D��1D data distributions. First, to use fewer computing nodes towards the end to better match performance bounds and save computing power. Second, to carefully move a few blocks between nodes to optimize even further the load balancing among nodes. We also demonstrate how 1D��1D data distributions, tailored for heterogeneous nodes, can scale better with homogeneous clusters than classical block-cyclic distributions. Validation is carried out both in real and in simulated environments under homogeneous and heterogeneous platforms, demonstrating compelling performance improvements.

Session Chair

Lei Mo (Southeast University)

Session A3

Algorithms for Networks

Conference
10:50 AM — 12:10 PM HKT
Local
Dec 1 Tue, 6:50 PM — 8:10 PM PST

An Efficient Work-Stealing Scheduler for Task Dependency Graph

Chun-Xun Lin, Tsung-Wei Huang and Martin D. F. Wong

1
Work-stealing is a key component of many parallel task graph libraries such as Intel Threading Building Blocks (TBB) FlowGraph, Microsoft Task Parallel Library (TPL) Batch.Net, Cpp-Taskflow, and Nabbit. However, designing a correct and effective work-stealing scheduler is a notoriously difficult job, due to subtle implementation details of concurrency controls and decentralized coordination between threads. This problem becomes even more challenging when striving for optimal thread usage in handling parallel workloads with complex task graphs. As a result, we introduce in this paper an effective work-stealing scheduler for execution of task dependency graphs. Our scheduler adopts a simple and efficient strategy to adapt the number of working threads to available task parallelism at any time during the graph execution. Our strategy is provably good in preventing resource underutilization and simultaneously minimizing resource waste when tasks are scarce.We have evaluated our scheduler on both micro-benchmarks and a real-world circuit timing analysis workload, and demonstrated promising results over existing methods in terms of runtime, energy efficiency, and throughput.

LBNN: Perceiving the State Changes of a Core Telecommunications Network via Linear Bayesian Neural Network

Yanying Lin, Kejiang Ye, Ming Chen, Naitian Deng, Tailin Wu, and Cheng-Zhong Xu

1
The core network is the most basic facility in the entire telecommunications network, which is consists of large number of routers, switches and firewalls. Network management like re-planning routes or adjusting policies is very important to avoid failures. However, the timing of intervention is very challenging. Too early intervention will incur unnecessary overheads, and too late intervention will cause serious disaster. In this paper, we analyzed a large data set from a real-world core telecommunications network and proposed Linear Bayesian Neural Networks (LBNN) 1 to perceive the core network state changes and make decisions about network intervention. In particular, we considered three aspects of complexity, including the weight of the mutual effect between devices, the dependence on the time dimension of the network states, and the randomness of the network state changes. The entire model is extended to a probability model based on the Bayesian framework to better capture the randomness and variability of the data. Experimental results on real-world data set show that LBNN achieves very high detection accuracy, with an average of 92.1%.

A Method to Detecting Artifact Anomalies in A Microservice Architecture

Faisal Fahmi, Pei-Shu Huang and Feng-Jian Wang

1
The microservice architecture is a Service-Oriented Architecture (SOA) where a service-based application can be composed of a number of smaller but independently concurrent running units, called microservices, to improve the performance and maintainability of the application. In an application with microservice architecture, an unexpected artifact state(s) inside a microservice may be exchanged to another microservice or other service units and corrupt the whole system of the application. On the other hand, the abnormal artifact operation pairs can be categorized into continuous and concurrent artifact anomalies which indicate that two sequential and parallel operations working on the same artifact resulting in abnormal behavior semantically. The recent studies showed that an SP-tree structure adopted in the detections of both anomalies inside a structured workflow can reduce the computation complexity of detection as linear. In this paper, we present a series of methods based on SP-tree to detect the artifact anomalies inside each microservice of the application during the design phase. Different from the design of applications with traditional services, where each service is assumed to contain all possible types of anomalies and cannot be modified directly, the designer of microservice is concerned with the limited scope and can modify each microservice based on the anomaly detection results. Our contribution includes identifying the artifact properties in a microservice architecture and the methods to detect the anomalies based on these properties which can simplify the detection of artifact anomaly in service-based applications.

Contention resolution on a restrained channel

Elijah Hradovich, Marek Klonowski and Dariusz R. Kowalski

1
We examine deterministic contention resolution on a multiple-access channel when packets are injected continuously by an adversary to the buffers of n available stations in the system, arbitrarily at rate at most �� packets per round. The aim is to successfully transmit packets and maintain system stability, that is, bounded queues, even in infinite executions. The largest injection rate for which a given contention resolution algorithm guaranties stability is called (algorithm��s) throughput. In contrast to the previous work, we consider a channel in which there is a strict limit k on the total number of stations allowed to transmit or listen to the channel at a given time, that can never be exceeded; we call such channel a k-restrained channel.
We construct adaptive and full sensing protocols with optimal throughput 1 and almost optimal throughput 1?1/n, respectively, in a constant-restrained channel. By contrast, we show that restricted protocols based on schedules known in advance obtain throughput at most min.
We also support our theoretical analysis by simulation results of our algorithms in systems of moderate, realistic sizes and scenarios, and compare them with popular backoff protocols.

Session Chair

Fei Tong (Southeast University)

Made with in Toronto · Privacy Policy · © 2022 Duetone Corp.